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SECTION  I 


INTRODUCTION 


1.1  Objectives 

The  principal  objective  of  this  project  is  to  evaluate  and 
develop,  jointly  with  RADC/ESE,  the  concept  of  using  charge  coupled 
devices  (CCD's)  in  the  mode  of  a  multiple-level  digital  processor  to 
perform  basic  operations  of  finite-field  digital  arithmetic  as  appli¬ 
cable  to  linear  digital  signal  processing  and  error  correction 
codi ng . 

1 . 2  Background 

Charge  coupled  device  technology,  fundamentally  an  analog  signal 
processing  technology,  finds  wide  and  growing  application  to  impor¬ 
tant  signal  processing  functions  such  as  spectrum  analysis,  spread 
spectrum  matched  filtering  and  analog  storage  and  integration.  It  is 
also  an  attractive  technology  for  reliable  monolithic  large  scale  inte¬ 
gration  of  binary  digital  logic  functions  because  of  its  speed  of  opera¬ 
tion,  high  packing  density,  low  power  consumption, and  relative  sim¬ 
plicity  of  structures  for  implementing  binary  logic. 

A  charge  coupled  device,  being  inherently  a  sampled  analog  de¬ 
vice,  should  be  capable  of  operation  as  a  multiple-level  digital  de¬ 
vice  if  means  are  provided  to  detect  and  refresh  the  discrete  levels 
being  used.  Such  a  capability  makes  possible  the  use  of  CCD's  to 
accomplish  the  defined  operations  of  addition  and  multiplication  in 
finite  algebraic  fields,  especially  prime  number  fields.  The  con¬ 
cept  was  originally  formulated  under  MITRE  IR&D  and  Technology  Base 
programs  in  FY'76.  That  work  was  reported  upon  in  conceptual  appli¬ 
cation  to  error  correction  coding  at  the  3rd  International  Conference 
on  the  Technology  and  Application  of  Charge  Coupled  Devices  [6,7]. 
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Under  this  project,  the  previously  formulated  concepts  are  being 
developed  further  and  expanded  in  application  to  the  general  area  of 
finite  field  digital  signal  processing.  The  task  efforts  emphasize 
analysis,  test  and  measurement  of  the  multi-level  digital  signal  pro¬ 
cessing  capabilities  of  state-of-the-art  CCD's,  leading  to  device  con¬ 
figuration  and  definition  of  LSI  or  VLSI  chip  architecture  to  imple¬ 
ment  defined  operations  in  prime  number  fields  and  their  extensions. 
Activities  consist  of  theoretical  analysis,  laboratory  experimenta¬ 
tion,  test  and  measurement,  demonstration, and  documentation.  The 
work  will  culminate  in  recommendations  for  new  device  development  to 
be  undertaken  by  RADC/ESE. 

1 . 3  Scope 

During  Fiscal  1979,  efforts  were  applied  in  the  areas  of  analysis 
of  multiple-level  digital  error  rates,  lab  measurements  of  multi-level 
operation,  and  the  definition  and  development  of  processing  structures. 
The  accomplishments  and  status  of  work  in  these  areas  is  described  in 
subsequent  sections  of  this  report. 

This  is  an  interim  technical  report,  documenting  activities  and 
results  midway  through  the  30-month  effort.  The  work  reported  re¬ 
presents  an  average  level  of  0.5  MITRE  technical  staff  per  month. 

The  report  is  divided  into  two  principal  sections,  Section  II 
describes  efforts  at  analysis  and  measurements  of  the  practical  limita¬ 
tions  of  multi-level  digital  CCD  operation.  Section  III  describes  a 
number  of  ideas  and  potential  techniques  for  signal  processing  applica¬ 
tions  of  multi-level  CCD  devices.  Since  this  is  an  interim  technical 
report  and  the  work  is  continuing,  conclusions  and  recommendations  are 
reserved  for  the  final  technical  report,  to  be  published  at  the  con¬ 
clusion  of  the  30-month  effort. 
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SECTION  II 


MULTIPLE-LEVEL  CCD  OPERATION 

A  fundamental  concern  of  this  project  is  the  ability  of  typical 
CCD  structures  to  operate  on  data  consisting  of  sequences  of  discrete 
digital  signal  charge  levels  representing  signal  values  that  assume 
one  of  a  finite  number  of  integer  levels.  Each  value  may  be  repre¬ 
sented,  for  example,  by  an  integer  multiple  of  0Q  elementary  charges 
where  Qq  is  the  charge  difference  between  readily  distinguishable 
levels  in  the  charge  transfer  device.  Typical  operations  involved 
are  charge  injection,  storage,  transfer  through  shift  register  stages, 
non-destructive  sensing,  charge  summation,  charge  splitting,  and 
charge  detection.  These  are  the  operations  to  be  expected  in  CCD 
circuit  structures  used  in  multiple-level  digital  filtering.  The 
ability  of  the  CCD  device  or  circuit  to  manipulate  the  charge  levels, 
without  modifying  them  to  incorrectly  assign  the  wrong  values  upon 
detection,  is  critically  important  to  the  success  of  the  operation. 
Analog  signal  processing  is  more  tolerant  of  noise  and  distortion 
introduced  by  the  device.  Binary  digital  signal  processing  involving 
Boolean  logic  operations  reduces  the  problem  to  one  of  distinguishing 
between  a  pair  of  levels  which  can  be  widely  separated  to  maximize 
the  signal  distance  relative  to  the  noise.  Our  work  is  predicated 
on  the  assumption  that  the  noise  and  distortion  introduced  by  the  CCD 
can  be  low  enough  so  that  the  reduced  signal  distance  resulting  from 
use  of  a  larger,  yet  finite,  number  of  levels  can  still  be  adequate 
to  perform  useful  signal  processing  functions.  It  is  expected  that 
sucessful  results  of  such  processing  methods  may  be  realized  as  re¬ 
duced  circuit  complexity,  fewer  interconnections,  and  greater  relia¬ 
bility,  all  obtained  with  the  attendant  advantages  of  simple  fabri¬ 
cation,  low  power  consumption,  and  small  feature  size.  Part  of  the 
key  to  sucoss  is  to  develop  innovative  uses  of  the  natural  CCD  opera¬ 
tions  such  as  cyclic  shifting,  transversal  filtering,  and  charge 


summation  in  order  to  implement  the  desired  processing  functions. 

But  first  it  is  necessary  to  establish  the  basic  ability  of  the 
(imperfect)  device  to  perforin  the  essential  operations  without 
excessive  distortion,  and  to  determine  the  limits  of  this  mode  of 
operation.  We  have  attempted  both  analytical  and  experimental  work 
to  answer  these  questions.  The  results  of  our  efforts,  at  this 
interim  stage  of  the  work,  are  described  below. 

2 . 1  Analysis  of  CCD  Mul tipi e  Level  Error  Rates 

A  theoretical  analysis  was  attempted  to  determine  the  average 
probability  of  error  in  estimating  the  discrete  value  of  a  multiple 
level  digital  signal  observed  at  the  utput  of  a  CCD  shift  register 
structure.  The  analytical  model  included  the  effects  of  shift 
register  length  (number  of  stages),  charge  transfer  inefficiency,  and 
intrinsic  noise  sources.  Buried  channel  device  parameters  were  to  be 
considered.  Initially,  the  analysis  was  to  oe  based  on  a  Gaussian 
model  of  the  noise  distribution,  as  implied  by  use  of  the  central 
limit  theorem.  It  is  realized,  however,  that  for  small  error  proba¬ 
bilities,  it  is  the  tail  of  the  distribution  that  is  important  and 
the  Gaussian  noise  model  may  be  inaccurate.  Consequently,  improved 
physical  and  mathematical  models  of  the  noise  processes  are  in  need 
of  continual  examination. 

The  analysis  of  error  rates  should  consider  not  only  sampled 
linear  delay  lines  but  also  transversal  and  recursive  filter  structures. 
Digital  filter  structures  are  commonly  described  and  analyzed  by 
input/output  methods.  The  basic  operations  performed  by  a  linear 
filter  are  often  conveniently  described  in  the  Z-transform  domain  in 
which  the  filter  output  is  determined  as  the  product  of  the  Z-transform 
of  the  input  sequence  with  the  system  function  of  the  filter,  most 
often  given  as  a  rational  polynomial  in  the  Z-transform  variable.  For 
example,  a  simple  delay  line  of  N  stages,  each  stage  suffering  a  fixed 
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fractional  loss  .  ,  has  a  system  function  given  as 


H 


(fl) 


(Z) 


1  - 
1  - 


which  for  c<<1  can  be  approximated  as 

H(N)(Z)  «  Z"N  exp  (MZ'1  -  1)  ) 


(1) 

(2) 


Such  an  approach  has  proven  useful  in  assessing  first-order  effects 
of  dispersion  and  frequency  response  limitations  of  delay  lines  (and 
filters)  operating  on  sampled  analog  signals.  One  could  next  take 
into  account  the  effect  of  additive  noise  (referred  to  the  output  or 
observed  as  an  equivalent  output  noise)  and  construct  a  crude  first 
order  model  to  assess  device  performance.  We  have  in  fact,  done  this 
previously  in  the  formulation  of  a  simple  Gaussian  model,  the  results 
suggesting  the  feasibility  of  the  multi-level  logic  role  for  CCD's  [1] 


In  extending  the  results  of  such  first-order  modeling  and 
analysis  to  transversal  and  recursive  filter  structures,  the  presence 
of  internal  feedback  loops  in  the  CCD  device  encumbers  the  analysis  to 
the  extent  that  the  system  functions  become  extraordinarily  complicated. 
Although  signal -flow-graph  techniques  can  be  readily  applied  to  con¬ 
struct  a  system  function,  the  computational  work  involved  in  reducing 
it  to  a  practical  and  usable  form  does  not  seem  worth  the  effort, 
especially  in  view  of  a  basic  theoretical  inadequacy  of  the  first- 
order  physical  model  to  accurately  describe  performance. 

A  basic  problem  with  tho  first-order  physical  model  of  the  sort 
described  is  that  the  random  processes  that  contribute  to  the  charge 
transfer  inefficiency,  the  recombination  charge  (dark  current)  added 
to  each  cell,  and  the  uncertainty  in  sensing  the  transferred  charge 
at  the  delay  taps  are  not  properly  taken  into  account.  In  fact  in  the 
first  order  model,  the  charge  transfer  loss  is  not  even  regarded  as  a 
random  process  variable;  only  the  average  value  is  used  as  a  fixed  and 
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invariant  quantity.  While  the  first  order  model  has  proven  sufficient 
for  describing  CCD  operation  with  analog  signals,  it  is  doubtful  that 
it  can  provide  an  adequate  model  for  accurately  predicting  digital 
error  rates,  especially  for  the  multiple-valued  decision  with  which 
we  are  concerned. 


Our  approach  to  this  problem  has  been  to  formulate  a  dynamical 
state-variable  model  of  the  CCD,  viewing  it  as  a  linear  sequential 
circuit  that  can  be  described  by  the  system  equations 

x(k  +  1)  =  A ( k )  x(k)  +  B(k)  F(k)  (3) 
y.(k)  =  C'(k)  x(k).  (4) 


In  this  formulation  the  vector  x(k)  represents  the  state  of  the  model 
at  the  k*'*1  clock  cycle.  The  matrix  A{k)  describes  the  unforced 
operation  of  the  circuit  subject  only  to  the  initial  state.  The  form 
of  A(k)  depends  on  the  circuit  structure  being  analyzed  (whether  a 
delay  line,  transversal  filter,  or  recursive  filter)  and  contains  the 
charge  transfer  loss  parameters  and  also  terms  involving  these  para¬ 
meters  in  intrinsic  feedback  loops.  In  its  most  general  form  for  our 
applications  A(k)  is  given  as 


A(k)  = 


0  eN-1 ( k ) 


0  0  .  .  .  . 

^-eN-2^^  0  .  .  .  . 


0 

0 


(5) 


0  0 


[ I- ( k )  ] 


e2(k)  l-c^k) 

.  c^(k)  +  (*£  [  (l-c^(k)  ] 


for  a  structure  having  N  charge  transfer  cells,  e.  ( k )  is  the  fraction 

4.  h  ”  4-  L. 

of  untransferred  charge  remaining  in  the  ntn  cell  after  the  k  cycle 
and  f(l-tn(k)]  is  the  fraction  of  charge  transferred.  ( k )  is  a 
random  variable  resulting  primarily  from  the  probabilistic  interface- 
state  and  bulk  charge-trapping  phenomena.  In  the  analysis  we  assume 
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that  this  parameter  is  statistically  independent  from  one  cell  to 
another  and  from  one  clock  cycle  to  another  and  that  all  ensemble- 
average  moments  are  stationary  on  k  (and  from  cell  to  cell).  The 
coefficients  a?,  .  .  .  ,a^  are  scale  factors  (or  tap  weights) 
weighting  the  output  of  each  cell  in  a  recursive  (LFSR)  structure. 

For  a  simple  delay  line,  these  coefficients  would  all  be  set  equal 
to  zero.  The  matrix  B(k)  describes  the  dependence  of  the  state  of 

the  circuit  on  the  external  driving  sources  F ( k )  that  in  this  model 

★ 

include  both  the  input  drive  and  additive  noise  sources.  The 
observation  matrix  C'(k)  is  chosen  to  express  the  observed  output 
vector  y^(k)  as  a  function  of  the  circuit  state.  The  linear  sequential 
circuit  represented  by  these  equations  is  sketched  in  Figure  1.*  (The 
model  does  not  include  an  output  noise  source,  which  can  be  included 
in  the  usual  manner) . 

Our  use  of  a  state-space  model  rather  than  an  input-output 
description  is  a  departure  from  what  is  usually  encountered  in  digital 
signal  processing  applications,  but  it  seems  necessary  because  of  the 
intrinsic  feedback  mechanisms  and  the  further  complication  of  the 
random  processes,  which  combine  to  make  the  model  a  non-stationary 
one.  As  a  consequence,  the  use  of  the  model  can  be  quite  complicated. 
In  order  to  determine  the  response  of  the  dynamical  system  described 
by  equations  (3)  and  (4)  we  must  provide  an  input  signal  that  is  typical 
and  then  determine  the  corresponding  output  by  solution  of  the 
equations.  Since  the  output  sequence  (as  a  function  of  the  index  k) 
will  be  represented  by  a  sequence  of  random  variables,  it  is  appro¬ 
priate  to  calculate  at  least  their  first  and  second  order  moments  in 
order  to  calculate  error  rates.  The  sufficiency  of  these  moments  is 
based  on  the  assumption  that  we  can  treat  the  output  values  as 
Gaussian  random  variables,  for  which  the  mean,  variance,  and 

*F( k)  =  f (k)  +  v(k)  +  i(k) 
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covariances  completely  describe  the  process. 


2.1.1  Solution  of  the  System  Equations 

A  solution  to  the  linear  difference  equation  (3)  is  given  by 

the  variation  of  constants  formula, 

k-1 

x(k)  =  f(k,k  )  x  (k  )  +  ^  i(M  +  l)  B(k)  f(k)  (6) 

1=ko 

where  x(kQ)  is  the  initial  state,  and  the  transition  matrix 
$(k,  kQ)  satisfies  the  homogeneous  matrix  equation 

±(k+l,kQ)  =  A ( k )  i(k,  kQ);  *  (kQ,k0)  =  1  (7) 

and  the  composition  law 

i(k,k  )  $(k  ,kQ)  =  i(k,kQ);  kQ  <  k  5  k  (8) 

Direct  substitution  will  verify  that  equation  (7)  is  solved  by  the 
transition  matrix 

£(k,k0)  =  A(k  -  1)  A(k  -  2)  .  .  .  A(k0)  (9) 

From  equation  (4),  we  can  express  the  output  as 

k-1 

l(k)  =  C'(k)  $  (k,  kQ)x{k0)  +  C'(k)V±(k.1  +  l)  B(k)F(k)  (10) 

Ko 

with  the  transition  matrix  given  by  equation  (9). 


Once  the  output  function  has  been  calculated,  probabilistic  methods 
can  be  applied  to  determine  the  average  probability  of  incorrectly 
estimating  the  output  digital  level.  For  example,  if  we  take  Qq 
(coulombs)  as  the  maximum  (full-well)  charge  and  subdivide  it  into 
q  equal  portions  to  represent  a  q-level  signal,  we  can  calculate  the 
probability  of  correct  detection  (PCD)  by  integrating  the  distribution 
of  the  output  from  Rq-q/2  to  Rq  +  q/2  for  a  value  that  is  supposed  to 
represent  the  Rth  level.  The  probability  of  incorrect  detection  is  one 
minus  PCD.  If  we  assume  a  Gaussian  distribution  at  the  output,  it  is 
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sufficient  to  calculate  the  mean  value  and  variance  of  ^(k)  in 
order  to  complete  the  integration  (taking  account  also  of  additive 
Gaussian  noise  at  the  output).  We  expect  that  these  parameters  of  the 
output  distribution  will  change  with  the  clock  cycle  k  as  a  result  of 
dispersion  and  recombination  noise  increasing  with  time. 

For  a  solution  it  is  necessary  to  calculate  the  ensemble  average 
mean  value  and  variance  of  equation  (10).  The  work  is  made  tractable 
by  the  assumption  of  statistical  independence  of  the  random  variables 
from  cell  to  cell  and  from  cycle  to  cycle.  In  particular,  we  have 
assumed  that 


em(k)  cn(k)  =  cm(k)  £n(k)  ;  m^n  (11) 

and 

£m(k)  cm(j)  =  S177  ;  Mj  {12) 

where  the  overbar  indicates  an  ensemble  average. 

We  assume  wide-sense  stationarity  from  cell  to  cell,  so  that 


em7kT  =  e  and  c2{k)  =  (13) 

taking  f-m(k)  as  a  random  variable  of  a  wide-sense  stationary  random 
process.  These  assumptions  allow  us  to  express  the  ensemble  average 
outputs  as 

(14) 

ylkT  =  rW  xTkl 


where 


xlkT  =  ATF~1)  A ( k  -  2)  .  .  .  x(k0) 

k-1  /  k-1  j 

+  X1  jTT707! 


i=k  '  j=i+l 

o 


i 


(15) 
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and  where  we  have  assumed  statistical  independence  between  the 
random  processes  describing  the  charge  transfer  inefficiency  and 
the  additive  noises  contributed  by  the  recombination  charge  and 
the  tap-weight  sensing  uncertainty.  We  assume  that  the  latter  pro¬ 
cesses  are  also  wide-sense  stationary  so  that  we  can  express  the 
mean  value  of  the  output  as 

k-1 

ZTkT  =  r  A  k'ko  x(k0)  +  r-  ^  Ak"(i  +  1)B  TTkl  (16) 

where  we  have  made  use  of  the  fact  that  B(k)  is  a  deterministic  and 
constant  matrix  in  our  model,  and  that  the  average  values  of  C'(k) 
and  A(k)  are  constant  matrices.  Observe  that  equation  (16)  could 
also  have  been  determined  by  first  taking  the  ensemble  averages  of 
equations  (3)  and  (4),  making  the  (ensemble  average)  state  matrix 
A(k)  a  constant  matrix,  and  then  applying  the  variation  of  constants 
formula.  Alternatively  the  solution  could  be  built  up  by  iteration 
on  k.  Regardless  of  the  method  used,  the  computation  is  formidable 
for  practical  values  of  N,  suggesting  a  numerical  calculation  in 
place  of  an  analytical  approach.  Even  a  direct  numerical  calculation 
will  be  extraordinarily  complicated  for  values  of  N  that  are  not 
trivially  small,  as  each  iteration  requires  multiplication  by 
N  x  N  matrices. 

Calculation  of  the  output  variance  may  be  performed  either 
by  the  use  of  the  variation  of  constants  solution  of  equation  (10) 
in  the  appropriate  statistical  formulas  in  an  attempt  to  develop 
a  closed-form  solution,  or  by  developing  a  difference  equation 
formulation  aimed  at  an  iterative  numerical  calculation.  The 
first  approach  leads  to  a  complicated  closed-form  expression  that  is  not 
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particularly  useful.  The  result  of  the  second  approach  is  a 
matrix  difference  equation  for  the  variance, 


o2(k  +  1)  =  A(k)a2(k)A'(k)  +  [A(k)  -  ATkTl  7[kT  [  A  *  ( k) -AjTk)  ]  (17) 

X  X 

+  B  a2(k)  B' 

o  2(k)  =  C'(k)o  2(k)C(k)  (18) 

y 

where  we  have  expressed 

x(k)x‘(k)  -  xTkT  xjTkJ  =  o  2(k)  (19) 


A  numerical  solution  for  the  variance  can  be  developed  from 
equations  (16),  (17),  and  (18)  by  iteration  on  k.  Numerical  cal¬ 
culation  of  the  variance,  for  practical  values  of  N,  will  be  even 
more  complicated  than  calculation  of  the  mean  value.  Terms  of  the 
kind  A(k)o  A'(k)  require  a  pair  of  N  x  N  matrix  multiplications 
followed  by  statistical  ensemble  averaging  of  the  scalar  elements 
of  the  product  matrix.  Clearly,  machine-aided  computation  is 
necessary.  Two  approaches  suggest  themselves: 

1)  Calculation,  for  small  length  N  and  sample 

value  k,  on  a  main  frame  computer  using  a  suitable 
programming  language  (like  AFL)  to  perform  the 
matrix  operations. 
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2)  Use  of  the  state  equation  directly  to 

simulate  the  device  as  a  dynamical  system  on 
a  digital  computer,  using  Monte  Carlo 
techniques  for  random  parameter  selection 
and  statistical  averaging. 

Both  approaches  for  numerical  analysis  will  be  practically 
limited  by  the  available  facilities  and  the  computational  effort 
required.  They  have  little  advantage  over  direct  measurement  of 
an  actual  device  other  than  the  ability  to  vary  the  parameters 
of  the  model . 

To  summarize  the  status  of  our  analysis,  we  have  concluded  that 
the  non-stationari ty  of  the  physical  model  makes inappl icable  simple 
computational  techniques,  like  the  Z-transform  method,  for  determina¬ 
tion  of  multiple-level  error  rates.  The  techniques  of  numerical 
analysis  based  on  a  dynamical  state-variable  model  described  above 
seem  limited  to  relatively  short  structures  that  can  be  handled 
by  brute-force  machine  computation.  Since  the  outcome  and  ultimate 
usefulness  of  such  an  analysis  remain  in  doubt,  we  have  de-emphasi zed 
analysis  in  favor  of  experimental  measurements.  Furthermore,  first 
order  models,  calculations  and  previous  limited  experiments  suggest 
low  error  rates  for  a  small  number,  say  3  or  5,  of  digital  levels. 
Moreover,  in  considering  structures  for  finite-field  operations 
we  conjecture  that  some  very  useful  operations  can  be  performed  in 
circuits  that  utilize  only  a  few  levels.  Some  examples  will  be 
discussed  in  Section  III. 

2.2  Laboratory  Measurement  of  CCD 
Multiple  Level  Error  Rates 

The  objective  of  the  laboratory  tests  and  measurements  to  be 
described  is  to  assess  the  ability  of  charge-coupled  devices  (CCDS) 
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to  accommodate  multiple  digital  charge  levels  at  low  error  rates 
in  detecting  the  valid  level.  The  basic  parameter  of  CCD  per¬ 
formance  that  must  be  determined  is  the  number  of  discrete  ampli¬ 
tude  levels  that  can  be  processed  and  correctly  detected  for  a 
given  device.  The  percentage  of  correct  detections  is  the 
criteria  that  will  be  used  to  compare  the  performance  of  different 
devices.  It  is  also  easily  related  to  the  error  rate  which  is 
1-PCD. 

The  PCD  can  be  expressed  as  a  function  of  three  basic 
operating  parameters.  These  three  parameters  are:  the  number 
of  discrete  levels  that  exist  within  the  useable  dynamic  range  of 
the  device,  M;  the  clock  frequency  of  the  CCD,  f  ;  and  the  ratio 
of  the  input  data  rate,  Rp  to  the  sampling  rate,  Rg,  of  the  CCD. 
These  operating  parameters  indirectly  affect  the  PCD  which  is 
determined  ultimately  by  the  charge  transfer  inefficiency,  dynamic 
range  and  intrinsic  noise  of  a  given  device.  Our  experimental 
test  facility  was  designed  to  enable  the  PCD  to  be  determined  as 
a  function  of  N,  fQ  and  Rp/Rs- 

Different  methods  of  performing  laboratory  tests  and  measure¬ 
ments  to  assess  the  ability  of  CCD's  to  accommodate  multiple  digital 
charge  levels  at  low  error  rate  were  examined.  As  a  first 
step  in  the  laboratory  program,  some  effort  was  expended  on  careful 
set-up  and  operation  of  a  Fairchild  CCD-321  dual  455-stage  CCD 
shift  register.  This  is  a  buried  channel  CCD  that  at  the  time  of 
work  represented  the  best  commercially  available  device  of  this 
type.  After  the  operation  of  the  device  became  thoroughly  under¬ 
stood  and  spurious  noise  effects  were  suppressed,  an  experiment  was 
organized  to  specifically  measure  the  PCD,  or  equivalently  the 
error  rate,  as  described  below. 
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2.2.1  Test  Circuitry  Description 


The  test  circuitry  created  to  determine  the  PCD  for  a  given 
device  is  modular  in  form  and  is  capable  of  adaptation  to  measure¬ 
ment  of  most  any  CCD  delay  line.  A  schematic  representation  of 
this  facility  is  shown  in  Figure  2. 

The  test  circuitry  consists  of  a  multi-level  digital  code 
generator,  error  detection  circuitry,  and  various  control  clocks. 
The  multi-level  digital  signal  is  derived  by  digital  to  analog 
conversion  of  the  output  of  a  pseudo-random  noise  (PN)  sequence 
generator.  The  output  word  length  of  this  code  generator  can  be 
varied  to  change  the  number  of  discrete  levels  present.  The  PN 
generator  is  programmable  and  controlled  by  switch  selection. 

The  various  control  clocks  allow  simultaneous  changes  in  the  CCD 
sampling  rate  and  the  data  rate.  The  error  detection  circuitry  is 
capable  of  producing  both  PCD  statistics  and  differential  error 
signal s . 

The  PCD  is  obtained  by  comparing  the  multi-level  sequence 
generated  with  the  sequence  present  at  the  output  of  the  CCD  delay 
line.  The  comparison  is  made  by  a  window  comparator  circuit  whose 
interval  is  determined  by  the  resolution  desired.  When  the 
output  signal  is  detected  correctly,  a  pulse  is  generated  by  the 
comparator  which  enables  the  error  counter  to  accumulate  events  to 
determine  the  PCD. 

The  error  detection  circuitry  is  also  capable  of  comparing 
two  delay  lines  simultaneously.  An  Exclusive-OR  gate  combining 
the  outputs  of  the  two  window  comparators  produces  an  error  signal 
which  determines  the  number  of  times  the  two  devices  disagree  and 
whether  a  correct  detection  was  made.  A  differential  amplifier  is 
also  used  to  produce  an  analog  error  signal. 
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2.2.2  Test  Results 

At  the  time  of  writing,  the  test  facility  is  in  the  final 
stages  of  construction  and  shakedown.  The  error  detection  circuitry, 
which  includes  the  window  comparator,  digital  delay  line,  and  PCD 
counter,  is  90"  complete.  The  multi-level  PN  code  generator  is 
completely  functional  and  is  being  used  along  with  a  storage  oscillo¬ 
scope  to  obtain  some  preliminary  data  while  the  test  facility  is 
being  completed. 

Two  CCD's  commercially  available  (from  Fairchild  and  Reticon) 
have  been  obtained  for  testing  purposes.  These  analog  delay  lines 
appear  to  represent  the  best  commercially  available  devices  of  this 
type  but  are  not  as  suitable  as  other  devices  under  development 
(RCA's  RSAM  for  example).  The  CCD  presently  unoer  test  is  the 
CC0-321A  video  delay  line  produced  by  Fairchild.  This  device 
contains  two  buried  channel  455-stage  analog  shift  registers. 
However,  it  requires  the  inconvenience  of  four-phase  clocking. 

Several  multi-level  digital  sequences  with  M  =  8  discrete 
levels  were  sampled  by  the  delay  line  at  a  sample  rate  Rg  =  2.5  MHz. 
The  data  rate  Rp  of  the  generated  sequence  was  then  varied  to 
produce  different  ratios  of  oversampling.  Photographs  for  Rs/Rg 
ratios  of  32,16,  and  8  can  be  found  in  Figures  3,  4,  and  5 
respecti vely .  The  change  in  the  data  rate  (versus  a  constant 
sampling  rate)  is  shown  by  the  change  in  the  horizontal  time  axis 
of  each  photograph. 

It  can  be  seen  from  these  photographs  that  the  changes  in 
data  rates  within  the  range  examined  seem  to  have  little  effect  on 
the  transmission  of  data  through  the  CCD  delay  line.  However, 
one  can  see  that  the  presence  of  clock  feedthrough  on  the  output 
signal  will  probably  cause  errors  to  occur  in  the  detection  process. 
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Figure  3.  Input/Output  Multi-Level 
Sequence  for  Rg=2.5  MHz, 
N=8  and  R$/Rq  =  32 


Output 


Input 


Figure  4.  Input/Output  Multi-level 
Sequence  for  R<----2.5  MHz 


Figure  5.  Input/Output  Multi-Level 
Sequence  for  R$=2.5  MHz, 
N=8  and  RS/RD  =  8 


Figure  6.  Input/Output  Multi-Level 
Sequence  for  R«-=2.5  MHz, 
N=32  and  R<-/Rn  =  32 


The  effects  of  the  ambiguities  created  by  the  clock  feedthrough 
can  be  clearly  seen  in  Figures  6,  7,  and  8.  This  noise  source 
causes  two  sequential  levels  to  overlap  as  can  be  seen  in  Figure 
6.  This  ambiguity  is  present  for  M  =  32  and  Rs/Rq  values 
of  32,  16,  and  8.  Several  techniques  are  being  explored  to 
eliminate  this  source  of  noise.  The  two  most  promising  methods 
are  additive  cancelling  of  the  clock  and  lowpass  filtering.  Both 
techniques  will  be  used. 

2.2.3  Continuing  Test  Plans 

The  testing  of  the  Fairchild,  Reticon,and  other  available 
CCD's  is  ongoing.  At  this  time,  we  are  concentrating  our  resources 
toward  the  completion  of  the  test  facility.  While  the  test 
facility  is  being  completed,  we  are  probing  the  optimum  operation 
of  the  devices  being  tested.  The  parameters  of  charge  transfer 
efficiency,  dynamic  range,  and  frequency  response  are  being 
determined  for  each  device.  These  preliminary  device  evaluations 
should  help  us  to  better  understand  the  operation  of  each  as  a 
multi-level  digital  delay  line.  The  procedures  developed  will 
help  determine  the  optimum  range  of  the  input  and  clock  biasing 
for  proper  multi-level  operation. 

After  completion  of  the  test  circuitry,  statistical  data  will 
be  gathered  on  the  performance  of  each  device  when  used  to  process 
multi-level  digital  signals.  The  probability  of  correct  detection 
statistics  will  be  examined  and  plotted  as  a  function  of  M,  fc, 
and  Ks/Rg.  We  will  determine  what  effects  charge  transfer  effi¬ 
ciency,  dynamic  range,  and  the  number  of  device  stages  have  on  the 
PCD  for  a  given  device  with  the  results  analyzed  and  applied  to 
more  complicated  processing  structures. 
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Figure  7.  Input/Output  Multi-Level 
Sequence  for  Rc=2.5  MHz, 
N=32  and  R^/RQ  =  16 


sow 


Figure  8.  Input/Output  Multi-Level 
Sequence  for  R^=2.5  MHz, 
N= 32  and  RS/RQ  =  8 


SECTION  III 


MULTIPLE-LEVEL  CCD  DIGITAL  SIGNAL  PROCESSING  FUNCTIONS 
AND  OPERATIONAL  STRUCTURES 

Given  that  multi-level  error  rates  for  state-of-the-art  CCD's 
are  sufficiently  low,  we  must  still  devise  efficient  monolithic 
structures  to  perform  the  needed  operations.  Work  elsewhere  is 
concerned  with  the  use  of  CCD's  for  multiple-valued  logic  operations 
based  on  extended  Boolean  logic.  Our  work  is  based  on  operations  in 
finite  algebraic  fields  or  rings  for  which  circuitry  needs  to  be 
developed  to  carry  out  the  basic  algebraic  operations  of  addition, 
multiplication,  and  inversion.  We  previously  observed  that  prime- 
field  multiplication  can  be  performed  by  cyclic  permutation  of  the 
multiplicative  group  of  the  field  and  that,  similarly,  addition  can 
be  carried  out  by  cyclic  permutation  of  the  additive  group  [1].  But 
most  signal  processing  functions  carried  out  in  finite  fields  will 
require  extension-field  operations  to  be  performed,  equivalent  to 
operating  with  polynomials  defined  over  the  prime  field.  Although 
two-dimensional  array  multipliers  organized  in  binary  trees  have 
previously  been  advocated  for  standard  elements  to  carry  out  the 
operation,  our  view  of  the  approach  is  that  it  tends  to  expand  the 
hardware  complexity  [2].  This  approach  has  the  attendant  risk  of 
decreased  circuit  reliability  and  increased  cost,  compensated  by  the 
ease  of  field-programmability  and  the  potential  for  incorporating 
some  degree  of  fault-tolerance  through  structural  redundancy.  But 
basically  the  finite-field  arra>  multiplier  approach  seems  not  well 
suited  to  typical  CCD  operations,  although  the  option  should  be  kept 
open  for  further  exploration. 

Below  we  discuss  some  of  our  ideas  for  carrying  out  Galois  field 
operations  with  reference  to  the  typical  signal  processing  operations 
of  discrete  transformation  and  cyclic  convolution.  Our  object  is  to 
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devise  structures  in  which  the  natural  CCD  functional  operations  can 
be  used  to  advantage;  consequently  there  is  some  emphasis  on  shift- 
register-like  structures.  Our  ideas  at  this  stage  are  tentative  and 
exploratory,  and  certainly  in  need  of  further  development  (or  selec¬ 
tive  abandonment).  They  are  also  intended  to  suggest  some  useful 
test  structures  for  exploratory  device  development  and  fabrication. 

3 . 1  Galois  Field  Multiplication  by  Feedback  Shift  Registers 

Earlier  work  showed  that  computations  in  the  base  field  GF(p) 
could  be  performed  by  cyclic  permutation  of  the  elements  of  the 
additive  or  multiplicative  groups  of  the  field  and  simple  circuitry 
using  the  Fairchild  CCD-311  was  configured  to  demonstrate  the  prin¬ 
ciple  for  p  =  5  [1].  It  was  largely  this  result  that  prompted  us  to 
investigate  further  the  capacity  of  a  CCD  to  unambiguously  store  and 
manipulate  charge  samples  that  represent  distinct  elements  of  GF(p). 
Similar  techniques  can  be  used  for  the  extension-field  operations. 

It  is  well  known  that  multiplicative  operations  in  GF(2m),  such 

as  scaling  by  a  fixed  element  ak  of  GF(2m),  raising  to  powers 

(a  )  =  a  ,  and  multiplying  two  variables  a  a',  can  be  performed 

by  linear  sequential  circuits  in  which  the  arithmetic  operations  are 

carried  out  in  the  prime  field  GF(2).  Unaer  this  project  we  have 

examined  the  generalization  to  p  f  2  with  the  result  that  similar 

circuits  can  be  devised  in  GF(pm)  where  p  f  2  is  a  prime  number. 

k  m 

For  example,  it  is  possible  to  multiply  an  element  a  of  GF(p  )  by 

n  L 

a  fixed  element  («  )  by  shifting  the  data  sample  a  (once)  in  an 
m-stage  linear  sequential  circuit  whose  feedback  and  feedforward 
connections  are  determined  by  the  scale  factor  a1.  The  connection 
matrix  for  the  circuit  can  be  determined  easily  from  the  field¬ 
generating  recursion,  which  the  matrix  must  also  satisfy,  with  the 
result  that  it  can  be  written  down  by  inspection.  As  an  example,  we 
have  drawn  in  Figure  9  a  set  of  shift  registers  that  can  be  used  for 
multiplying  by  the  elements  of  GF(5^);  only  a  few  are  actually  shown 
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SHIFT  REGISTER  CONNECTION  MATRIX 


Figure  9.  SCALING  MULTIPLIERS  IN  GF(54) 
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for  purposes  of  illustration.  Each  of  these  circuits  forms  the 
required  product  as  the  contents  of  the  register  in  a  single  data 
shift.  The  operations  of  addition  and  multiplication  are  carried 
out  in  the  prime  base  field  (modulo  5).  In  addition  to  defining  sums 
in  GF(5),  it  is  also  necessary  in  the  structures  shown  to  implement 
scalar  multiplication  by  the  elements  of  GF(5). 

If  we  work  with  an  extension  field  of  the  form  GF(3m), 

then  the  operations  of  addition  and  multiplication  of  the 

base  field  elements  are  further  simplified  since  the  elements  of 

GF ( 3 )  can  be  represented  as  0,  1,  -1.  Consequently,  non-zero 

multiplication  is  achieved  either  by  sign  inversion  or  non-inversion 

of  the  signal.  A  set  of  multipliers  that  implement  scalar  multi- 

k  4 

plication  by  the  elements  a  of  GF ( 3  )  is  shown  in  Figure  10  using 
this  representation  for  the  elements  of  the  base  field. 

Since  there  are  pm  -  1  non-zero  elements  in  GF(pm)  --  80  elements 
for  GF(34)  --  one  might  expect  to  require  the  same  number  of  registers 
to  multiply  data  by  all  the  field  elements  in  parallel  in  a  single 
clock  cycle.  Actually  only  m-1  linearly  independent  registers  would 
be  necessary  to  generate  all  of  the  products  providing  their  outputs 
are  appropriately  combined.  Notice  for  example  that: 

F  f>7  =  F  l  +  F  2  and  F  68  =  F  2  +  F  3  (20) 

-a  —a  ~a  —a  —a  —a 

67 

so  that  the  results  of  multiplying  by  a  can  be  obtained  by  multiplying 

1  2 

separately  by  a  and  a  and  adding  the  products.  Also,  the  existence 
of  unique  multiplicative  inverses  can  be  used  to  reduce  the  number  of 

2  42  42 

separate  registers;  for  example  a  =  a  or  equivalently  -F^2  = 

so  that  multiplication  by  a  can  be  accomplished  by  first  multiplying 

i 

by  a  and  then  complementing  the  output.  In  this  way  each  pair  of 
registers  can  be  used  to  generate  at  least  5  products  on  one  clock 
cycle. 


3.2  Galois  Field  Addition 


Addition  in  GF(pm)  may  be  considered  as  the  addition  of  poly¬ 
nomials  of  degree  m-1  having  coefficients  in  the  prime  field  GF(p). 
The  addition  is  carried  out  by  adding  (modulo  p)  the  coefficients  of 
the  variables  of  the  same  degree.  Unlike  the  addition  of  binary 
n-tuples  corresponding  to  radix  2  numerical  representation,  there  is 
no  carry  operation.  The  operation  is  the  same  as  cartesian  addition 
of  m-dimensional  vectors. 


3.2.1  Addition  Modulo  p 


One  of  the  most  important  functions  that  needs  to  be  developed 
is  addition  modulo  p.  As  discussed  previously, we  can  treat  addition 
in  GF(pm)  as  m-vector  addition  over  GF(p)  in  which  the  vector  com¬ 
ponents  are  added  modulo  p.  Multiplication  in  GF(p  )  can  be  imple¬ 
mented  either  by  exploiting  the  cyclic  property  of  the  multiplicative 
group  (as  shown  in  3.1  abov^  or  by  performing  serial  multiplication 
in  which  partial  products  (modulo  p)  are  formed  and  then  added  (or 
accumulated)  by  vector  addi tion  modulo  p  as  in  the  case  of  an  array 
multiplier.  No  matter  how  we  partition  the  computation  over  GF(pm), 
it  is  inescapable  that  GF(p)  adders  will  be  required.  Such  an  adder 
can  be  developed  by  extending  the  notion  of  CCD  digital  logic 
techniques  employed  by  TRW  for  performing  binary  logic  operations  [3] . 
One  such  scheme  for  a  GF(p)  adder  is  described  below. 


The  operation  of  addition  modulo  p  for  a  pair  of  elements  a,  b, 
of  GF(p)  is  defined  by  the  simple  rule: 


(a  +  b) 


mod  p 


a  +  b;a  +  b<p 


(a  +  b)  -  p;  a  +  b  ;  p 


(21) 
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where  the  operations  on  the  right  hand  side  of  equation  (20)  are  the 
operations  normally  defined  in  the  infinite  field  of  all  the  integers. 

A  CCD  structure  that  executes  this  operation  is  shown  in  Figure  11;  it 
represents  a  slight  variation  of  the  cellular  structure  used  by  TRW 
for  a  binary  Exclusive-OR  gate  [3].  The  operation  of  the  suggested 
adder  can  be  described  by  the  following  sequence  of  events,  involving 
charges  that  exceed  the  bias  (zero)  level. 

1.  On  the  first  clock  pulse,  charge  packets  stored  under 
electrodes  A  and  B  are  transferred  and  combined  under 
electrode  D.  The  charge  exceeding  the  controlled  value  p 
flows  over  the  barrier  and  accumulates  under  electrode  C. 

2.  On  the  next  pulse,  the  charge  packets  residing  either  under 
gate  C  or  gate  D  are  transferred  to  the  region  under  elec¬ 
trode  E,  depending  on  the  states  of  the  transfer  electrodes 

T  and  T.  The  T  electrode  is  controlled  by  the  element  sensing 
the  charge  under  electrode  C  the  presence  of  charge  under  C 
inhibiting  the  transfer  from  D.  The  charge  under  C  is  trans¬ 
ferred  to  E  in  either  case,  being  either  zero  or  data. 

3.  On  the  subsequent  pulse  the  charge  on  electrode  E  is  sensed 
and  the  electrodes  C  and  D  are  preset  to  the  zero  level  by 
transferring  their  remaining  charge  to  a  diode  charge  sink. 

The  charge  packets  representing  the  next  set  of  values  to 

be  added  are  transferred  to  electrodes  A  and  B  and  the  cycle 
is  ready  to  repeat. 

If  a  +  b  •  p  then  (a  +  b )moc|  p  is  transferred  to  electrode  C  during 
the  first  third  of  the  cycle  and  is  transferred  to  electrode  E  in  the 
second  step.  The  charge  remaining  under  D  must  equal  the  modulus  value 
p,  but  is  prevented  from  further  transfer  due  to  the  charge  present 
under  C.  If  a  +  b  --  p  then  no  charge  is  transferred  to  electrode  C 
(by  overflowing  the  barrier)  and  the  value  (a  +  b)mo(^  p  resides  under 
D  after  the  first  step,  and  is  transferred  from  D  to  E  on  the  second 
step,  the  transfer  gate  T  now  permitting  the  transfer  since  no  charge 
is  sensed  under  C.  On  the  third  part  of  the  cycle,  the  residue 
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Figure  11.  MODULO-P  ADDER  CELL 


(a  +  b)  ,  is  sensed  on  electrode  F  and  the  other  gates  are 
'mod  p 

re-initial ized. 

The  technique  described  can  be  modified  to  configure  a  4-input 
adder.  The  operation  of  such  a  structure,  shown  in  Figure  12, can 
be  described  by  the  following  sequence  of  operations: 

1.  Charge  packets  stored  under  electrodes  and  are 
transferred  to  electrode  C^;  charge  in  excess  of  the 
modulus  value  p  is  allowed  to  flow  over  the  controlled 
barrier  and  accumulate  under  E.  Simultaneously,  the 
charge  packets  under  Ap  and  Bp  are  transferred  to  Cp 
with  the  charge  in  excess  of  p  allowed  to  flow  over  the 
barrier  and  accumulate  under  E.  Any  charge  accumulated 
under  E  that  exceeds  the  modulus  value  p  is  allowed  to 
cross  the  barrier  and  accumulate  under  F. 

2.  After  completion  of  step  (1)  any  charge  packets  residing 
under  are  transferred  to  Cp  by  enabling  the  appropriate 
transfer  gate.  Again,  charge  in  excess  of  the  modulus  value 
flows  over  the  barrier  to  E,  and  any  charge  in  E  that 
exceeds  p  flows  over  the  subsequent  barrier  and  settles 
under  electrode  F. 

3.  In  the  next  step,  the  charge  under  E  is  sensed  to  either 
permit  or  inhibit  the  transfer  of  charge  from  Cp  to  E ,  the 
presence  of  non-zero  signal  charge  in  E  inhibiting 

the  transfer. 

4.  In  the  final  step,  the  charge  under  F  is  sensed  to  either 
permit  or  inhibit  the  transfer  of  charge  from  E  to  F  by 
control  of  the  transfer  gate.  Non-zero  signal  charge  in  F 
prohibits  the  transfer.  The  charge  under  F  at  the  com¬ 
pletion  of  this  step  is  sensed  to  determine  the  value 

(A j  +  +  Ap  +  Bp)  modulo  p. 
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I  A  -  39,257 


Figure  12.  4-INPUT  MODULO-P  ADDER 
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In  order  to  verify  the  correct  operation  of  the  scheme  just 
described,  it  is  convenient  to  display  the  various  levels  of  charge 
stored  under  the  electrodes  after  the  second  step.  These  are  de¬ 
pendent  on  the  values  of  the  summed  charge  packets  as  shown  in 
Table  I.  The  last  condition  motivates  step  3  above;  afterwards  the 
sum  is  stored  either  under  electrode  E  or  F  and  is  sensed  at  the 
completion  of  step  4. 

The  scheme  outlined  can  be  developed  into  an  8-input  adder  by 
providing  an  additional  charge  accumulating  cell  and  controlled 
barrier.  Such  an  adder  is  shown  schematically  in  Figure  13. 

The  adders  described  schematically  are  presented  as  exploratory 
ideas.  The  actual  details  of  clocking,  formation  of  potential 
barriers,  and  sensing  techniques  need  to  be  examined  more  closely  in 
order  to  assess  the  feasibility  of  the  scheme. 

3 . 3  Fast  Transform  Structures 

The  work  being  carried  out  under  this  project  was  motivated 
originally  by  the  prospect  of  devising  simple  structures,  based  on 
multi-level  CCD  operation,  for  decoding  Reed-Solomon  error-correcting 
codes.  It  was  previously  established  that  such  codes  could  actually 
be  designed  over  GF(p)  where  p  is  a  prime  number  greater  than  2.  The 
advantage  seen  was  that  the  arithmetic  operations  would  be  performed 
in  the  base  field  GF(p)  rather  than  in  some  extension  field  GF(pm) 
with  the  result  that  the  hardware  could  be  simplified  if  CCD  multi¬ 
level  digital  processing  could  be  used.  In  order  to  make  the  codes 
useful  it  would  be  necessary  for  p  to  be  reasonably  large,  say  p  =  17 
or  p  =  31 ,  thus  prompting  the  examination  of  CCD  operation  with  such 
numbers  of  discrete  amplitude  (charge)  levels. 

Lately,  we  have  come  to  believe  that  it  may  be  more  useful  to 
work  in  an  extension  field  where  both  the  characteristic  p  and 
the  degree  of  extension  m  are  small.  For  example,  we  might  choose 
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TABLE  I 


INTERMEDIATE  STORED  CHARGE  LEVELS 


CONDITION* 

C 

C2 

E 

F 

A^B^p,  A2+B 2>p 
and 

[Al+Bl]p+[A2+B2]piP 

g 

p 

P 

[A1+Bl+A2+B2]p 

A^+B^>p,  A2+B2>  p 
and 

[A1+B1 ]p+[A2+B21p<P 

G 

p 

[A1+B1+A2+B2]p 

G 

Al+BlilP’  a2+b2<p 

and 

lAl*Bl)p*lA2*B2]plp 

c 

p 

P 

[fl1+E1+A2+B2]p 

A,+B ,>p,  A9+B9<  p 

1  1  and 

"'l*BIlp+lVB2)p<  p 

e 

p 

[Al*Bl+A2+¥p 

G 

Ai+BrJ^B2^ 

[A1  Jp+[A2+B2]p^ 

G 

p 

[Al+Bl+A2+B2^p 

G 

Aj+Bj  • P,  A2+B2>p 
and 

[Al*Bl1p+[A2+B21p<p 

e 

p 

P 

[A1+Bl+A2+B2]p 

A.+B„'p,  A2+B2<p 
c  and  d 

lfli*Bilp*'VB2lpa> 

e 

p 

[A1+Bi+A2+B2]p 

G 

A1+B1^P’  A2+B2--p 
and 

[A1+Bl1p+[A2+B21p<p 

G 

[Ai+M 

A2+B2] 

e 

G 

*A.j  +  :  [Ai  +  B ^ ] p  modulo  p 

e  =  bias  charge  only 


41 


H3IHHV9  d 


H3IHHV8  d 


p  =  3  and  m  =  4  to  design  a  Reed-Solomon  code  having  a  block  length 

of  p"'  -  1  =  80  symbols  of  3  levels  each.  Our  change  in  direction  is 

prompted  by  several  factors.  First  of  all  it  is  becoming  apparent 
that  the  extension  field  operations  are  not  overly  complicated  when 
the  field  characteristic  is  small,  as  was  discussed  above  for  the 
multiplier  structures.  Secondly,  the  requirements  for  multi-level 
CCD  operation  are  reduced  to  levels  for  which  high  reliability  is 
evident.  Finally,  the  use  of  such  extension  fields  admits  the  use  of 
fast  transform  techniques  that  can  be  effectively  employed  in  a 
Reed-Solomon  decoding  algorithm,  and  probably  in  other  digital  signal 
processing  applications  as  well. 

A  discrete  transform  can  be  defined  over  GF(pm)  that  is  analogous 

to  the  discrete  Fourier  transform  (DFT)  defined  over  the  field  of  com¬ 

plex  numbers  [4].  This  transform  is  interesting  in  its  own  right  for 
conceptual  reasons  and  also  because  it  exhibits  the  cyclic  convolu¬ 
tion  property  which  can  be  useful  to  evaluate  the  convolution  of  two 
sequences  by  transform  techniques  in  which  the  product  of  the  trans¬ 
forms  produces  the  transform  of  their  convolution.  The  other  well- 
known  Fourier  transform  properties  are  also  useful  computational ly . 

In  addition,  the  discrete  Fourier  transform  is  strongly  linked 
with  the  realization  of  digital  filters  that  implement  a  rational 
transfer  function  [5].  More  recently,  techniques  of  finite  algebra 
have  been  applied  to  the  design  of  digital  filters  in  a  manner  that 
overcomes  some  of  the  limitations  (approximation  error,  roundoff 
error,  instability)  of  digital  filter  design  [6].  One  rather  general 
approach  to  finite-field  digital  filter  synthesis  realizes  the  filter 
(in  each  field  representation;  as  the  weighted  sum  of  the  coefficients 
of  the  moving  window  discrete  transform  of  the  input,  as  shown 
schematically  in  Figure  14. 
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Sensing  the  importance  of  the  discrete  transform  function,  we 
have  expended  some  effort  on  this  project  to  examine  structures  that 
implement  the  transform  in  the  class  of  Galois  fields  GF(p')  where 
p  >  2  is  a  prime  number.  We  have  found  that  a  systematic  fast  compu¬ 
tational  algorithm  can  be  devised  that,  unlike  the  Winograd  algorithm, 
applies  systematically  to  all  such  fields.  This  led  us  further  to 
examine  the  processing  structures  implied  and  the  implications  of  the 
required  arithmetic  operations  with  regard  to  the  use  of  multi-level 
CCD  techniques.  Some  results  of  this  work  are  described  below,  with 
the  inclusion  of  a  specific  example  for  clarity. 

3.3.1  Transform  Definition 

Let  a^,  a^,  .  .  .  anl  be  distinct  elements  of  a  finite  algebraic 
field  GF(pm)  of  order  pm-l  having  an  element  b  of  order  n.  The  linear 
transformation 


i=0 


(22) 


is  a  mapping  of  GF(pn)  into  itself.  It  is  assumed  that  n  divides 
pm-l,  the  order  of  the  field,  and  for  our  purposes  will  be  equal  to  it. 
In  that  case  the  field  element  b  is  a  primitive  n^  root  of  unity. 

It  can  be  shown  for  any  integer  r. 


n,  r  =  0  mod  n 
0,  otherwise 


(23) 


and  the  property  can  be  used  to  verify  by  direct  calculation  that 
the  mapping  that  is  inverse  to  that  of  equation  (22)  is  the  linear 
transformation 


n-1 

a.  =  n'1  ^  Aj  b"ji  (24) 

j=0 
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where-n  *n  =  pm  -  1.  Equations  (22)  and  (24)  define  a  discrete 
transform  pair  over  GF(pm)  and  the  operations  of  addition  and  multipli¬ 
cation  are  defined  in  the  same  field.  Addition  may  be  performed  as 
modulo-p  addition  of  the  m-tuples  that  are  the  field  elements  com¬ 
prising  the  sum.  Multiplication  may  be  defined  by  addition  of  indices 
of  the  field  elements 

br  bS  =  br  +  S.  (25) 

The  transform  pair  of  equations  (22)  and  (24)  are  analogous  to  the 

t  h 

discrete  Fourier  transform  pair  for  which  b  would  be  a  complex  nu 
root  of  unity  and  the  arithmetic  would  be  defined  in  the  complex 
number  field:  in  particular,  the  cyclic  convolution  property  holds. 
Fast  computation  algorithms,  analogous  to  the  FFT  algorithms,  can 
also  be  appl ied. 

If  the  sequence  to  be  transformed  is  expressed  as  a  polynomial 
over  GF(pm) 

a(x)  =  *  +  a!  x  +  x2  +  .  .  .  •  +  an_j  x"'1  (26) 

then  the  transform  of  the  sequence  a^,  a^,  a^,  .  .  .  an_j  is  seen  to 
be  identical  with  polynomial  evaluation  of  a(x)  at  the  n  distinct 
points  b1^,  b*,  b^,  .  .  .  bn_1  and  the  inverse  transform  is  identical 
with  interpolation  of  the  polynomial  a(x)  from  its  n  values. 

a(bJ)  =  a0  +  bJ  (aj  +  .  .  .  +  bJ  (an_2  +  bJan-1)...)  (27) 

or  equivalently  it  can  be  interpreted  as  the  remainder  of  the  poly-r 
nomial  division  a(x)/(x  -  bJ )  evaluated  at  the  point  bJ .  The  second 
i nterpretation  may  be  represented  as  the  set  of  polynomial  congruences, 

a(x)  a(b^)mod  (x  -  bJ);  j  =  0,...,  n-1.  (28) 


46 


’V 


The  congruences  of  equation  (28)  can  be  calculated  in  principle 
by  dividinq  the  polynomial  a(x)  separately  by  the  first  degree 
polynomials  (x  -  bJ ) ,  keeping  only  the  remainders.  That  is  opera¬ 
tionally  equivalent  to  evaluating  a(x)  at  the  n  non-zero  field  points 
b'-’ .  In  either  case  n2  multiplications  in  GF(pm)  are  implied. 

A  class  of  fast  computational  algorithms--fast  because  they 
reduce  the  number  of  multiplications  in  GF(pm)--can  be  devised  by 
consideration  of  the  different  ways  of  factoring  the  polynomial  x  -1 
over  GF(pm) .  One  way  is  to  factor  as 

xn-l  =  JT  (x-bj)  (29) 

i=0 

the  first  degree  factors  (x-bJ)  being  the  modulus  polynomials  of 
equation  (28).  This  factorization  leads  to  the  direct  computation  of 
transform  values,  requiring  n  multiplications  in  GF(pm).  A  block 
diagram  of  a  circuit  to  perform  the  calculation  is  shown  in  Figure  15. 

Another  factorization,  one  that  reduces  the  number  of  multiplica¬ 
tions  in  GF(pm),  results  from  a  successive  decomposition  of  xn-l  into 
factors  of  the  form  (x2k  -  b2f").  Observe  that  for  p  >  2  and  n  =  pm-l 
we  can  always  establish  that  b^2  =  -b8  ;  therefore 

(xkV)  (xk-bN/2  +  l)  =  (xkV)  (xkV)  =  (x2k-b2*).  (30) 

The  polynomial  xn-l  can  be  progressively  factored  in  this  manner,  the 
factorization  being  represented  conveniently  as  a  binary  tre^  as  shown 
in  Figure  16  for  x80-l  factored  over  GF(34).  It  is  easy  to  show  that 
the  evaluation  at  one  of  the  roots  bJ  only  requires  processing  along 
one  of  the  distinct  tree  paths.  This  tends  to  reduce  the  number  of 
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multiplications  in  GF ( pm )  from  n^  to  something  on  the  order  of  n  1 og ^  n. 
Notice  that  we  can  write  the  division  algorithms 


a(x)  =  P1  (x)  (x)  +  r1  (x) 

r1  (x)  =  P2  (x)  Q2  (x)  +  r2  (x) 

and  after  substitution 

a(x)  =  P1  (x)  Q1  (x)  +  P2  (x )  Q2  (x)  +  r2  (x). 
If  P2(x)  divides  P^  (x),  we  can  write  equation  (32)  as 


P2(x)  Q, (x)  +  Q  (x)  +  r  (x) 

PjTxT  1  1 


(31a) 

(31b) 

(32) 

(33) 


which  demonstrates  that  the  remainder  r2(x)  can  be  calculated  progessively 
by  dividing  a(x)  by  P^(x)  and  then  dividing  the  first  remainder 
r^(x)  by  P2(x).  The  sequence  can  be  continued  indefinitely  as  we  pro¬ 
gress  along  a  path  in  the  tree.  This  type  of  decomposition  is  analogous 
to  the  decimation-in-frequency  FFT  algorithm. 

The  processing  structure  that  accomplishes  an  80-point  transform 
over  GF(3^)  by  use  of  this  method  is  shown  in  Figure  17.  There  are 
480  multiplications  in  GF(3^)  required;  of  these  approximately  one- 
sixth  are  simple  multiplications  by  +  b°. 

A  further  reduction  in  the  number  of  multiplications  in  GF(pm) 
required  to  calculate  an  n-point  transform  is  possible  by  considera¬ 
tion  of  a  different  factorization  of  xn-l.  If  we  factor  this  poly¬ 
nomial  into  the  product  of  the  minimal  polynomials  of  the  field 
elements,  then  we  can  devise  a  two-step  algorithm  in  which  the  first 
step  is  division  by  the  set  of  minimal  polynomial  factors  and  the 
second  step  is  division  of  the  remainder  polynomials  by  the  first 
degree  polynomials  (x-bJ)  that  are  factors  of  the  minimal  polynomials. 
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Figure  17.  80-POINT  TRANSFORM  OVER  GF(34) 

BY  SPECTRAL  DECIMATION  (SCHEMATIC) 
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Explicitly,  we  can  factor 


M 

Xn-1  =  TT  m i  (x)  (34) 

i=0 

where 

m.  (x)  =  T\  (x  -  b1  pJ)  (35) 

j  =  l 

are  the  minimal  polynomials.  These  are  monic,  irreducible  over 
GF(p),  and  have  all  coefficients  in  GF(p).  Division  by  these  poly¬ 
nomials  in  the  first  step  of  the  algorithm  replaces  multiplications 
in  GF(pm)  by  multiplications  in  GF(p)  which  are  generally  much  simpler 
to  perform.  The  second  step  of  the  algorithm  requires  multiplications 
in  GF(pm)  to  evaluate  the  remainder  polynomials  at  the  points  of 
the  field,  but  the  number  of  these  multiplications  is  greatly  re¬ 
duced  because  there  are  a  relatively  small  number  of  remainder  poly¬ 
nomials,  each  of  degree  less  than  the  degree  of  field  extension.  The 
number  of  multiplications  in  this  final  step  could  be  further  reduced, 
at  the  expense  of  more  additions,  by  using  the  field  recursion  that 
expresses  all  the  field  elements  as  linear  combinations  of  a  subset 
of  size  m. 

To  illustrate  this  second  algorithm,  and  compare  it  with  the 
decimation-in-frequency  type,  we  have  worked  out  an  example  for  an 
80-point  transform  over  GF(3^).  In  Table  II,  we  list  the  elements  of 
GF(3^)  representing  them  as  4-tuples  over  the  set  j-1,  0,  lj  used 
to  represent  GF(3).  In  Table  III,  we  list  the  minimal  polynomials 
of  GF(3^)  and  their  respective  roots  in  GF(3^).  In  Figure  18  we 
show  schematically  a  processing  structure  for  calculating  the  transform. 
The  circuits  that  divide  the  input  data  by  the  minimal  polynomials 
are  linear  feedback  shift  registers  over  GF(3).  In  this  example,  the 
number  of  multiplications  in  GF(3^)  is  216,  while  6800  multiplications 
in  the  base  field  are  performed  by  85  multipliers. 
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Table  II 


THE  ELEMENTS  OF  GF(34) 
GENERATED  BY  «4  +  a3  +  a2  -  «  -  1 


ELEMENT 

REPRESENTATION 

ELEMENT 

REPRESENTATION 

0 

0 

0 

1 

0 

0 

0 

-1 

a 1 

0 

0 

I 

0 

a41 

0 

0 

-1 

0 

i" 

0 

1 

0 

0 

a4  ? 

0 

-1 

0 

0 

n  ^ 

1 

0 

0 

0 

a83 

-1 

0 

0 

0 

-1 

-1 

i 

1 

<x44 

1 

1- 

-1 

-1 

a ' 

0 

-1 

0 

-1 

a45 

0 

1 

0 

1 

,i6 

-1 

0 

-1 

0 

a46 

1 

0 

1 

0 

a7 

1 

0 

-1- 

1 

a47 

-1 

0 

1 

1 

a0 

-1 

1 

0 

1 

:x40 

1 

-1 

0- 

-1 

n'> 

-1 

1 

0 

-1 

a40 

1 

-1 

0 

1 

a'n 

-1 

1 

1- 

-1 

1 

-1- 

-1 

1 

-1- 

-1 

1- 

-1 

1 

1- 

•  1 

1 

a1  2 
a1  A 

0 

-1 

1 

-1 

a''2 

0 

1- 

-1 

1 

-1 

1 

-1 

0 

a Lj  3 

1-1 

1 

0 

-1 

0- 

-1- 

-1 

ab4 

1 

0 

1 

1 

u1' 

1 

0 

1 

-1 

a55 

-1 

0-1 

1 

_  1 

0 

0 

1 

a06 

1 

0 

0- 

-1 

' 

1 

1 

0-1 

a87 

-1 

-1 

0 

1 

m  * 

0- 

-1 

0 

1 

a50 

0 

1 

0 

-1 

an 

-1 

0 

1 

0 

fxb9 

1 

0- 

-1 

0 

0 

1- 

-1 

-1- 

-1 

a60 

1 

1 

1 

1 

a71 

1 

1 

0 

1 

a01 

-1 

-1 

0- 

-1 

T> 

a  * 

0- 

-1 

-1 

1 

a62 

0 

1 

i-i  ; 

a23 

-1- 

-1 

1 

0 

a63 

1 

1- 

-1 

0 

rt24 

0- 

-1- 

-1- 

■1 

a04 

0 

1 

1 

1 

QLZ  *’ 

-1- 

-1- 

■1 

0 

a65 

1 

1 

1 

0 

a26 

0 

0- 

-1- 

1 

a06 

0 

0 

1 

1 

a27 

0- 

-1- 

■1 

0 

a07 

0 

1 

1 

0 

a?B 

-1- 

■1 

0 

0 

a68 

1 

1 

0 

0 

u2  3 

0 

1- 

■1- 

1 

(l69 

1- 

-1 

1 

1 

a3  0 

1- 

•  1- 

■1 

0 

-1 

1 

1 

0 

a31 

1 

1 

1 

1 

a71 

-1- 

-1- 

-1- 

-1 

ra32 

0 

0- 

•  1 

1 

a72 

0 

0 

1- 

-1 

a33 

0- 

1 

1 

0 

,t73 

0 

1- 

-1 

0 

a3'* 

-1 

1 

0 

0 

a  74 

1- 

■1 

0 

0 

a3  5 

-1 

1- 

-1- 

1 

a70 

1- 

-1 

1 

1 

a36 

-  \ 

0 

1- 

1 

a70 

1 

0- 

■1 

1 

a37 

1- 

■1 

1- 

1 

a77 

-1 

1- 

-1 

1 

a38 

1 

0 

0 

1 

a70 

-1 

0 

0- 

-1 

a” 

-1- 

1- 

1 

1 

a7  9 

1 

1 

1- 

-1 

Table  III 

MINIMAL  POLYNOMIAL  FACTORS  OF  x 
(SPLITTING  FIELD:  GF(34)) 


Polynomial 


x)  =  x4+  x3  +  x2  +  1 
x)  =  x4  -  x3  -  x  +  1 
x)  =  x4  +  x2  -  1 
x)  =  x4  +  x2  +  l 
(x)  =  X2  +  x  -  1 
(x)  =  X4  +  X  -  1 
13(x)  =  X4  -  X3  -  X2  +  X  -  1 

H<*>  ■  1  >!  *  1 

16(x)  . . 

17(x)  =  X4  -  X  -  1 
20(x)  =  X2  +  1 

22(x)  =  X4  +  X2  -  X  +  1 

(x)  =  X4  -  x3  -  1 

(x)  =  X4  -  X2  -  1 

2g(x)  =  X4  +  X2  +  X  +  1 

40(x)  =  X  +  1 
41(x)  =  X4  -  X3  +  X2  -  I 

!x)  =  X4  -  x3  +  X  +  1 


SCHEMATIC  OF  AN  80-POINT  TRANSFORM  OVER  GF(3 
BY  FAST  POLYNOMIAL  EVALUATION 


Some  of  the  well-known  results  of  number  theory  can  be  used  to 
enumerate  the  irreducible  polynomials  of  degree  d  over 
GF(p),  which  in  turn  allows  us  to  enumerate  the  minimal 
polynomials  of  each  degree  and  consequently  the  number  of  required 
multiplications  in  GF(pm).  This  allows  us  to  assess  the  complexity 
of  the  algorithm  for  a  number  of  different  cases  without  explicity  de¬ 
termining  the  structure.  The  numbers  of  required  multiplications  over 
GF(pm)  are  enumerated  in  Table  IV  for  a  number  of  different  cases, 
and  the  numbers  are  plotted  in  Figure  19  to  compare  the  trend  with 
the  N  log^N  behavior  of  the  FFT  class  of  algorithms. 

It  is  possible  to  pursue  the  idea  of  factoring  xn-l  to  devise  a 
further  simplification  that  reduces  the  number  of  multiplications 
over  GF(p)  that  need  to  be  performed.  In  particular,  we  can  first 
factor  xn-l  into  the  product  of  the  cyclotomic  polynomials  Q^(x) 
where  the  indices  k  are  divisors  of  n.  Thus,  for  example 

x80-l  =  Q<8(»(x)  Q<4°><)<20>(x)  Q(16!(x)  Q<10>(x)  .  .  .  (36) 

.  .  .Q(xl(x)  Q(S)(x)  g<4)(x)  Q(2)(x)  Q(1)(x) 

The  cyclotomic  polynomials  for  this  example  are  listed  in  Table  V. 

In  general,  these  polynomials  have  coefficients  that  are  either 
0,  or  +  1,  up  to  index  105.  The  non-zero  coefficients  are  typically 
sparse.  Each  of  the  polynomials  Q^(x)  can  be  factored  over  GF(pm) 
into  the  product  of  the  minimal  polynomials  of  the  field  elements  of 
order  d,  so  that  the  cyclotomic  factorization,  if  it  precedes  the  second 
fast  algorithm  described  above,  has  the  effect  of  reducing  the 
number  of  multiplications  in  GF(p)  that  are  not  multiplications  by 
the  set  {0,  1,  -1  }  (which  are  also  the  elements  of  GF(3)  by 
coincidence).  The  factorization  of  xn-l  into  cyclotomic  polynomials 
depends  only  on  n  and  is  otherwise  independent  of  the  field  over  which 
the  transform  is  being  calculated.  The  processor  structure  for  this 
first  step  can  therefore  be  field-independent.  As  an  example  of  the 
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TABLE  IV 

MULTIPLICATIVE  COMPLEXITY  OF  DISCRETE 
TRANSFORM  ALGORITHM  BY  FAST  POLYNOMIAL  EVALUATION 


FIELD 

NUMBER  OF 
TRANSFORM 
POINTS  (N) 

NUMBER  OF 

EXTENSION  FIELD 
PRODUCTS 

N  log^N 

GF(25) 

31 

120 

154 

GF(27) 

127 

756 

1,214 

GF(28) 

255 

1,719 

2,038 

GF(2U) 

2047 

20,461 

22,517 

GF(33) 

28 

50 

122 

GF(34) 

80 

224 

505 

GF(35) 

242 

962 

1916 

GF(37) 

2186 

13,106 

24,253 

GF(52) 

24 

24 

110 

GF(53) 

124 

247 

862 

GF(55) 

3124 

_ 

12,484 

36,267 

12,484 


862 

36,267 


/  Q  m  =  5 
'  ®  m  =  7 


lta©-m  = 3 
m  =  4 


•  N  =  2-1 


a  n  =  3m  i 


N  logjN  -  M 


250  500  750  1000  1250  1600  1750  2000  2250  2500  2750  3000  3250 

N  (NUMBER  OF  TRANSFORM  POINTS) 


Figure  19.  MULTIPLICATIVE  COMPLEXITY  OF  POLYNOMIAL  EVALUATION 
DISCRFTE  TRANSFORM  ALGORITHM 


TABLE  V 

CYCLOTOMIC  FACTORS  OF  x 


CYCLOTOMIC  POLYNOMIAL 

Q(1)(x)  = 

X-l 

Q(2,(x)  = 

x+1 

II 

X 

^r 

O' 

x2+l 

q(5)(x)  . 

4  3  2 

X  +X  +X  +X+1 

Q(8)(x)  ■ 

x4+l 

q(10)(x) 

=  X4-X3+X2-X+1 

Q(16>(x) 

=  x8+l 

Q(2°>(x) 

=  x8-x6+x4-x2+l 

-  x16-x12+x8-x4+l 

QmM 

=  x32-x2V6-x8+l 

IRREDUCIBLE  FACTORS 


m80(x )  =  mQ(x) 

m4g(x) 

m20(x) 

m15(x) 

ml 0 ( X )  5^50^x^ 
mg(x) 

m5(x),m25(x) 

m4(x),m44(x) 

m2(x),m14(x),m22(x),m26(x) 
m1(x),m7(x),m11(x),m17(x), 
m22(x) *  ^  ^  3 ( x ) *^42 ( x ) 


See  Table  III 


use  of  these  polynomials,  we  have  shown  in  Figure  20  a  set  of 
dividers  that  can  precede  the  structure  of  Figure  18.  In  this  case, 
there  is  only  a  slight  decrease  in  the  number  of  multiplications  in 
GF(3).  As  a  second  example,  we  show  in  Figure  21  the  complete  struc- 

p 

ture  for  a  24-point  transform  in  GF(5  ).  In  this  case,  the  number  of 
multiplications  in  GF(5)  that  are  not  multiplications  by  {0,l,-l}has  been 
reduced  from  218  to  64  by  incorporating  the  cyclotonn'c  factorization  step. 

3.4  Cyclic  Convolution 

The  successful  application  of  CCD  multi-level  logic  to  digital 
signal  processing  will  depend,  among  other  things,  on  the  ability  to 
devise  means  of  utilizing  CCD  structures  that  are  either  relatively 
easy  to  fabricate  or  are  minor  modifications  of  existing  devices. 
Structures  such  as  tapped  delay  lines  and  programmable  transversal 
filters  fall  into  this  category,  but  they  have  certainly  not  been 
developed  in  anticipation  of  multiple-valued  digital  operation  such 
as  we  envision.  Below  we  discuss  an  idea  for  utilizing  the  generic 
binary-programmable  transversal  filter  structure  for  performing 
finite-field  cyclic  convolution.  In  later  sections,  we  will  expand 
the  idea  for  other  appl ications . 

Convolution  is  a  frequently  encountered  signal  processing 
operation.  The  cyclic  convolution  of  two  n-point  sequences  with 
elements  belonging  to  GF(p)  can  be  regarded  as  the  product,  modulo 
xn-l,  of  two  polynomials  of  degree  n-1  having  coefficients  in  GF(p). 

It  seems  likely  that  the  type  of  binary-programmable  transversal 
filter  useful  for  PN-sequence  matched  filtering,  developed  for 
correlating  an  analog  signal  against  a  stored  binary  reference,  can 
also  be  used  for  finite-field  convolution.  Just  as  it  is  possible 
to  perform  an  analog-analog  correlation  by  A/D  conversion  of  the 
reference  followed  by  parallel  correlation  in  several  analog-binary 
devices,  so  is  it  possible  to  partition  the  finite-field  operations 
among  a  set  of  elementary  correlators. 


A 


If  we  represent  the  elements  of  GF(p)  by  the  set 
{-(p-l)/2,  .  .  .  ,  -1,  0,  +1,  .  .  .  +(p-l)/2}  then  we  can  treat  the 
multiplication  of  an  element  b  by  another  element  a  by  the  elementary 
process  of  summing  b  to  itself  a  times.  For  cyclic  convolution,  we 
may  use  such  a  representation  to  form  the  various  products  over  GF(p) 
in  a  set  of  tapped  delay  lines  that  apply  the  tap  weights, 

0, +  1  and  accumulate  the  partial  product  in  each  component  device 
before  combining  their  outputs.  Of  course  the  accumulated  sum  must 
be  reduced  modulo  p  for  polynomial  multiplication  but  this  can  be 
done  separately  at  the  output  of  each  correlator,  as  well  as  at  the 
final  output,  in  order  to  limit  the  dynamic  range. 

A  structure  that  correlates  a  sequence  over  GF(5)  with  the 
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m-sequence  generated  by  the  primitive  polynomial  a  +a+2  is  shown 
schematically  in  Figure  22.  The  m-sequence  is  also  shown  for  re¬ 
ference.  In  this  diagram  it  is  assumed  that  the  zero-value  of  the 
signal  is  represented  by  a  charge  value  Qq  that  is  at  the  mid-range 
of  a  full  well.  Signal  charges  weighted  by  +1  are  routed  to  the 
positive  summing  bus  while  those  weighted  by  -1  are  routed  to  the 
negative  summing  bus.  For  a  zero  tap  weight  the  signal  charge  is 
not  routed  to  either  bus.  We  assume,  of  course,  that  the  charge 
sensing  is  nondestructive. 

An  application  suggested  by  the  apparatus  of  Figure  22  is  a 
matched  filter  detector  for  PN  sequences  defined  over  GF(p).  In 
such  an  application,  the  modulo  p  reduction  is  not  needed  as  the 
cross-correlation  is  formed  in  the  ordinary  number  field.  The 
autocorrelation  function  of  the  m-sequence  used  is  shown  in  Figure  23. 
In  this  figure,  we  have  also  shown  the  cross-correlation  of  the 
m-sequence  with  the  reference  used  in  the  central  correlator,  which 
is  simply  a  hard-limited  version  of  the  m-sequence.  We  see  that  the 
full  correlation  produces  an  improvement  of  8  db  relative  to  the 
hard-limited  version  (the  output  of  the  central  correlator),  as  well 
as  suppressing  the  positive  sidelobes. 
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-  59,247 


ON  OF  M-S 


An  additional  refinement  to  the  apparatus  could  be  made  if  it 
is  not  desired  to  represent  the  zero-value  of  the  signal  by  a  posi¬ 
tive  charge.  In  that  case,  positive  and  negative  signal  samples  could 
be  detected  and  separately  correlated  in  sets  of  correlator-banks, 
each  bank  operating  only  on  positive  signal  samples  and  the  zero-value 
being  represented  by  the  bias  charge  (fat  zero)  setting  the  minimum 
charge  level  for  the  wells.  This  approach  would  require  twice  as 
many  correlators,  but  the  dynamic  range  requirements  would  be  relaxed 
somewhat. 

3 . 5  Polynomial  Division  with  Transversal  Structures 

As  we  have  shown  in  Section  3.3,  division  by  polynomials  over 
GF ( p )  is  an  important  step  in  processing  to  calculate  the  discrete 
transform  of  an  input  sequence  defined  over  GF(pm).  The  polynomial 
dividers  described  in  that  section  implemented  the  division  algorithm 

A(x)  =  P(x)  Q(x)  +  R(x)  (37) 

to  divide  the  polynomial  A(x)  by  the  polynomial  P(x).  It  was  assumed 
that  A(x)  was  a  polynomial  over  GF(pm)  while  the  divider  polynomial 
P(x)  was  defined  over  GF(p),  the  division  being  carried  out  simul¬ 
taneously  by  a  set  of  m  identical  linear  feedback  shift  registers. 

It  will  suffice  to  consider  just  one  of  these  shift  registers,  treating 
its  input  as  a  sequence  over  GF(p).  It  is  appropriate  then  to  write 
the  division  algorithm  as 

a(x)  =  P(x)  q(x)  +  r(x)  (38) 

where  all  of  the  polynomials  are  defined  over  GF(p).  The  m-ary 
case  is  carried  out  by  m  such  divisions  in  parallel. 

An  inconvenience  for  CCD  implementation  of  the  division  register 
used  to  implement  equation  (38)  is  that  charge  summation  is  required 
in  certain  stages  determined  by  the  divider  polynomial.  This  has 
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several  drawbacks.  First  of  all  the  summation  must  be  defined 
modulo  p  if  an  excessive  buildup  of  charge  along  the  register  is 
to  be  avoided.  This  complicates  the  structure  of  the  shift  register. 
Although  modulo  p  adders  can  be  designed,  as  shown  conceptually  in 
Section  3.2,  it  seems  preferable  to  separate  the  adder  from  the 
delay  line.  In  that  case,  a  transversal  structure  seems  more 
appropriate. 

The  polynomial  divider  that  implements  equation  (38)  may  be 
regarded  as  a  digital  filter  whose  input  is  the  polynomial  a(x)  and 
whose  output  is  the  quotient  q(x).  The  feedback  taps  are  determined 
by  the  divisor  P(x)  and  the  remainder  r(x)  is  left  in  the  register 
after  the  input  a(x)  has  been  processed.  The  filter  can  be  trans¬ 
posed  into  a  transversal  form  by  using  signal  flow-graph  techniques 
(reverse  all  paths,  exchange  adders  and  path  nodes,  exchange  input 
and  output  nodes).  The  transversal  fi 1  ter  implements  the  same  input-output 
function  as  the  original  divider;  in  other  words  it  has  the  same 
unit  pulse  response.  An  example  of  a  divider-network  filter  and  its 
transposed  version  is  shown  in  Figure  24.  Although  the  two  filters 
have  the  same  unit  pulse  response,  the  circuit  state  as  represented 
by  the  register  contents  differs  on  each  cycle.  For  our  application 
of  polynomial  division,  it  is  the  remainder  polynomial  r(x)  repre¬ 
senting  the  final  state  that  is  of  principal  interest  so  the  trans¬ 
versal  filter  structure  cannot  be  used  directly. 

We  may,  however,  use  the  transversal  filter  approach  to  poly¬ 
nomial  division  to  determine  the  remainder  in  two  steps;  the  first 
step  is  to  determine  the  quotient  q(x)  and  the  second  step  is  used 
to  calculate 

r(x)  =  a(x)  -  P(x)  q (x)  (39) 

which  is  the  needed  result.  The  transposed  divider  circuit  is 
used  to  find  the  quotient  which  is  then  multiplied  by  the  divider 
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polynomial  in  a  second  transversal  filter  and  the  product  is  sub¬ 
tracted  from  the  suitably  delayed  input  sequence.  A  transversal 
structure  that  implements  division  of  a  24-point  sequence  by  the 
cyclotomic  polynomial  Qv  '(x)  is  shown  in  Figure  25  as  an  example 
of  the  method.  In  comparison  with  the  canonic  LFSR  divider,  addi¬ 
tional  circuitry  is  required,  as  well  as  additional  processing  time, 
but  the  tapped  delay  line  transversal  structure  seems  more  convenient 
to  implement  with  available  CCD  techniques  making  the  tradeoff  a 
reasonable  one. 

3.6  Galois  Field  Representation  With  m-Sequences 

A  finite  field  processing  operation  that  arises  frequently  is 
multiplication  of  pairs  of  elements  of  GF(pm).  For  example,  this 
operation  was  a  major  concern  in  the  development  of  a  fast  algorithm 
for  calculating  an  N-point  discrete  transform  in  GF(p  )  as  discussed 
in  Section  3.3.  One  method  of  performing  the  multiplication  is  to 
use  a  linear  sequential  circuit  designed  over  GF(p)  to  multiply  a 

o  U 

data  sample  a  by  a  constant  a  ,  the  feedback  and  feedforward  connec¬ 
tions  being  determined  by  the  scale  factor  and  the  data  providing 
the  initial  loading  of  the  shift  register.  The  product  is  formed  by 
shifting  the  register  once.  This  technique  was  described  in 
Section  3.1.  Although  the  operations  required  are  additions  and 
multiplications  in  the  prime  field  GF(p)  rather  than  in  the  extension 
field  GF(pm),  it  was  evident  that  a  number  of  adders  and  multipliers 
are  required  and  that  the  sequential  circuit  structure  is  quite 
different  for  each  scale  factor. 

If  both  the  field  characteristic  and  degree  of  extension  are 
small,  then  the  possibility  presents  itself  of  representing  the 
field  elements  by  distinct  cyclic  shifts  of  the  m-sequence  generated 
by  the  primitive  generator  of  the  field.  This  can  have  dramatic 
effect  in  reducing  hardware  complexity  for  implementing  the  arithmetic 
operations  at  the  expense  of  increased  sequential  processing.  The 
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time-expansion  factor  for  sequential  multiplication  is  (pm-l)/m 
which  practically  restricts  the  representation  to  small  values  of 
both  p  and  m. 

The  transformation  (h:  a+e)  that  maps  the  elements  of  GF(pm) 
into  cyclically  shifted  m-sequences  is  a  bijective  mapping  that 
maps  the  identity  element  into  itself  for  both  the  additive  and 
multiplicative  groups  of  the  field,  thus  causing  the  group  trans¬ 
formations  to  be  group  isomorphisms.  The  practical  consequences  of 
this  algebraic  statement  are  that  multiplication  of  two  elements 

and  6^  is  accomplished  by  cyclically  shifting  the  element  ( e1 ) , j  - 
times  (or  the  element  (8J),  i  -  times),  and  that  addition  of  B1  and 
BJ  is  accomplished  by  the  component-wise  sum,  modulo  p,  of  their 
(pm-l)-tuple  representations.  For  addition  of  the  elements  B1 ,  as 
in  the  case  of  the  elements  a1,  no  carry  operations  are  required. 

Multiplication  of  a  data  sample  a1  by  a  constant  factor  a  can 
be  performed  by  circularly  shifting  the  data  in  a  recursive  loop 
N  =  pm  -  1  times  while  reading  out  the  product  at  a  tap  determined 
by  the  multiplier  a  as  shown  in  Figure  26.  In  this  figure,  we  also 
show  the  addition  of  another  data  sample  to  the  product  to  imple¬ 
ment  the  first  degree  function  ax^  +  x^  =  ak  a1  +  a1,  The  adder 
shown  must  be  a  modulo  p  adder,  but  only  one  of  these  is  required 
since  the  function  is  formed  sequentially. 

A  pair  of  circuits  of  the  type  shown  in  Figure  26  could  be  inter¬ 
connected  to  operate  alternately,  to  performing  the  function  of  poly¬ 
nomial  evaluation.  This  operation  is  important  in  the  computation  of  the 
discrete  transform  even  when  a  fast  algorithm  is  employed,  as  dis¬ 
cussed  in  Section  3.3. 
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igure  26.  FINITE  FIELD  REPRESENTATION  BY  M-SEQUENCE 
SCALAR  MULTIPLICATION  AND  ADDITION 


The  output  of  the  polynomial  circuit  of  Figure  2f  is  in  the 
m-sequence  representation  and  needs  to  be  transformed  back  to  the 
compressed  representation  eventually,  al though  further  computations 
can  be  performed  in  the  m-sequence  represented  field.  A  number  of 
methods  suggest  themselves  for  the  inverse  transformation,  but  the 
correlation  detection  approach  seems  most  appropriate,  and  raises 
the  distinct  possibility  of  incorporating  a  degree  of  fault  tolerance 
into  the  operation.  A  method  of  correlation  detection  was  discussed 
above  in  Section  3.4. 

The  m-sequence  representation  was  introduced  with  the  motive 
of  reducing  hardware  complexity  while  utilizing  generic  CCD  functions 
based  on  shift  register  operations  and  transversal  structures.  It 
is  apparent  that  some  of  the  operations  are  relevant  both  to  error¬ 
coding  and  to  spread-spectrum  matched  filtering.  That  raises  the 
distinct  possibility  that  in  systems  which  combine  these  signal  processing 
functions  (such  as  JTIDS),it  may  be  possible  to  utilize  the  m-sequence 
representation  to  combine  some  of  the  processing  functions  or 
hardware  elements. or  both,  used  for  PN  sequence  demodulation  and 
Reed-Solomon  decoding  ;  and  perhaps  it  could  be  used  to  inject  a 
degree  of  fault  tolerance  into  the  hardware.  The  subject  merits 
further  exploration  as  an  area  of  application  for  some  of  the  techniques 
discussed  above. 
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